Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
培训最先进模型所需的基础设施变得过于昂贵,这使得培训此类模型仅适用于大型公司和机构。最近的工作提出了几种协作培训此类模型的方法,即通过将许多独立方的硬件汇总在一起,并通过Internet培训共享模型。在此演示中,我们合作培训了类似于Openai Dall-E的文本到图像变压器。我们邀请观众加入正在进行的训练运行,向他们展示有关如何使用可用硬件贡献的说明。我们解释了如何应对与此类训练运行相关的工程挑战(缓慢的沟通,有限的内存,设备之间的性能不均和安全问题),并讨论了观众如何设置协作培训。最后,我们表明所得模型在许多提示上生成了合理质量的图像。
translated by 谷歌翻译
变压器已成为机器学习的重要主力,并具有许多应用。这需要开发可靠的方法来提高其透明度。已经提出了多种基于梯度信息的多种可解释性方法。我们表明,变压器中的梯度仅在本地反映该函数,因此无法可靠地确定输入特征对预测的贡献。我们将注意力头和分层确定为这种不可靠的解释的主要原因,并提出了通过这些层传播的一种更稳定的方式。我们的建议在理论上和经验上都显示出良好的LRP方法的适当扩展,以克服简单基于梯度的方法的缺乏,并实现先进的解释绩效在广泛的变压器模型和数据集上。
translated by 谷歌翻译
最近已被证明大型语言模型在各种任务集中获得合理的零射普通化(Brown等,2020)。它已经假设这是语言模型的隐式多任务学习的结果,在语言模型中的预押(Radford等,2019)。可以通过明确的多任务学习直接引起零拍常规化?为了以缩放测试这个问题,我们开发一个系统,以便轻松地将任何自然语言任务映射到人类可读的提示表单中。我们转换一组大量的监督数据集,每个数据集都有多个提示,具有不同的措辞。这些提示的数据集允许基准测试模型执行完全看不见的任务的能力。我们介绍了一个普拉克尔编码器 - 解码器模型(Raffel等,2020; Lester等,2021),覆盖各种任务。该模型在多个标准数据集中达到强大的零点性能,通常优于其尺寸的型号超过16倍。此外,我们的方法对来自Big-替补基准测试的任务子集具有强烈性能,优于其尺寸的6倍。所有提示和培训的型号都可以在https://github.com/ bigscience-workshop / protectsource / httpsource / https://huggingface.co/bigscience/t0pp。
translated by 谷歌翻译
现代深度学习应用程序需要越来越多地计算培训最先进的模型。为了解决这一需求,大型企业和机构使用专用的高性能计算集群,其建筑和维护既昂贵又远远超出大多数组织的预算。结果,一些研究方向成为几个大型工业甚至更少的学术作用者的独家领域。为了减轻这种差异,较小的团体可以汇集他们的计算资源并运行有利于所有参与者的协作实验。这种范式称为网格或志愿者计算,在众多科学领域看到了成功的应用。然而,由于高延迟,不对称带宽以及志愿者计算独特的几个挑战,使用这种用于机器学习的方法是困难的。在这项工作中,我们仔细分析了这些约束,并提出了一种专门用于协作培训的新型算法框架。我们展示了我们在现实条件下的SWAV和Albert预先预价的方法的有效性,并在成本的一小部分中实现了与传统设置相当的性能。最后,我们提供了一份成功的协作语言模型预先追溯的详细报告,有40名参与者。
translated by 谷歌翻译
增加制药实验室和生产设施的自动化水平起着至关重要的作用。然而,这一领域的特殊要求使其挑战适应其他行业中存在的尖端技术。本文概述了相关方法以及如何在制药行业中使用,特别是在发展实验室中。最近的进步包括能够处理能够处理复杂任务的灵活移动机械手。然而,由于接口的多样性,将来自许多不同供应商的设备集成到端到端的自动化系统中是复杂的。因此,在本文中考虑了各种标准化方法,提出了一种概念来进一步服用一步。该概念使具有视觉系统的移动操纵器能够“学习”每个设备的姿势,并利用来自通用云数据库的条形码 - 获取接口信息。该信息包括控制和通信协议定义以及操作设备所需的机器人操作的表示。为了定义与设备相关的动作,设备必须具有 - 除了条形码 - 作为标准的基准标记。在随访论文中的适当研究活动之后,将详细阐述该概念。
translated by 谷歌翻译
Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. Transformers is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered stateof-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made by and available for the community. Transformers is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. The library is available at https://github.com/ huggingface/transformers.
translated by 谷歌翻译
As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-theedge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller generalpurpose language representation model, called DistilBERT, which can then be finetuned with good performances on a wide range of tasks like its larger counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage knowledge distillation during the pre-training phase and show that it is possible to reduce the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive biases learned by larger models during pre-training, we introduce a triple loss combining language modeling, distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device study.
translated by 谷歌翻译
View-dependent effects such as reflections pose a substantial challenge for image-based and neural rendering algorithms. Above all, curved reflectors are particularly hard, as they lead to highly non-linear reflection flows as the camera moves. We introduce a new point-based representation to compute Neural Point Catacaustics allowing novel-view synthesis of scenes with curved reflectors, from a set of casually-captured input photos. At the core of our method is a neural warp field that models catacaustic trajectories of reflections, so complex specular effects can be rendered using efficient point splatting in conjunction with a neural renderer. One of our key contributions is the explicit representation of reflections with a reflection point cloud which is displaced by the neural warp field, and a primary point cloud which is optimized to represent the rest of the scene. After a short manual annotation step, our approach allows interactive high-quality renderings of novel views with accurate reflection flow. Additionally, the explicit representation of reflection flow supports several forms of scene manipulation in captured scenes, such as reflection editing, cloning of specular objects, reflection tracking across views, and comfortable stereo viewing. We provide the source code and other supplemental material on https://repo-sam.inria.fr/ fungraph/neural_catacaustics/
translated by 谷歌翻译
Edge computing is changing the face of many industries and services. Common edge computing models offload computing which is prone to security risks and privacy violation. However, advances in deep learning enabled Internet of Things (IoTs) to take decisions and run cognitive tasks locally. This research introduces a decentralized-control edge model where most computation and decisions are moved to the IoT level. The model aims at decreasing communication to the edge which in return enhances efficiency and decreases latency. The model also avoids data transfer which raises security and privacy risks. To examine the model, we developed SAFEMYRIDES, a scene-aware ridesharing monitoring system where smart phones are detecting violations at the runtime. Current real-time monitoring systems are costly and require continuous network connectivity. The system uses optimized deep learning that run locally on IoTs to detect violations in ridesharing and record violation incidences. The system would enhance safety and security in ridesharing without violating privacy.
translated by 谷歌翻译